Author Diarization Using Cluster-Distance Approach
نویسندگان
چکیده
Author Diarization is a new task introduced in PAN’16, to identify portion(s) of text with in a document written by multiple authors. This paper presents, our proposed approach for author diarization task. Various types of stylistic features which include lexical features, used to uniquely identify an author. Furthermore, to find anomalous text with in a single document, ClustDist method used. Finally, clusters were generated by using simple k-means clustering algorithm. Experiments were performed both on training and testing data sets. It has been observed that by changing the text fragments length, promising results can be achieved.
منابع مشابه
The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering
Speaker clustering is an important task in many applications such as Speaker Diarization as well as Speech Recognition. Speaker clustering can be done within a single multispeaker recording (Diarization) or for a set of different recordings. In this work we are interested by the former case and we propose a simple iterative Mean Shift (MS) algorithm. MS algorithm is based on Euclidean distance....
متن کاملSpeaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs
In this paper, we present an approach for speaker diarization based on segmentation followed by bottom-up clustering, where clusters are modeled using adapted Gaussian mixture models. We propose a novel inter-cluster distance in the model parameter space which is easily computable and which can both be used as the dissimilarity measure in the clustering scheme and as a stop criterion. Using ada...
متن کاملPhonetic subspace mixture model for speaker diarization
This paper presents an improved distance measure for speaker clustering in speaker diarization systems. The proposed phonetic subspace mixture (PSM) model introduces phonetic information to the BIC distance measure. Therefore, the new PSM model-based BIC distance measure can remove the effect of phonetic content on the diarization results. The typical BIC distance measure can be seen as a speci...
متن کاملSpeaker diarization using divide-and-conquer
Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application...
متن کاملT-test distance and clustering criterion for speaker diarization
In this paper, we present an application of student’s t-test to measure the similarity between two speaker models. The measure is evaluated by comparing with other distance metrics: the Generalized Likelihood Ratio, the Cross Likelihood Ratio and the Normalized Cross Likelihood Ratio in speaker detection task. We also propose an objective criterion for speaker clustering. The criterion deduces ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016